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Abstract. We consider two-player stochastic games played on a finite state space 
for an infinite number of rounds. The games are concurrent: in each round, the 
two players (player 1 and player 2) choose their moves independently and si- 
multaneously; the current state and the two moves determine a probability distri- 
bution over the successor states. We also consider the important special case of 
turn-based stochastic games where players make moves in turns, rather than con- 
currently. We study concurrent games with cj-regular winning conditions spec- 
ified as parity objectives. The value for player 1 for a parity objective is the 
r_^ , maximal probability with which the player can guarantee the satisfaction of the 

r Hi • objective against all strategies of the opponent. We study the problem of conti- 

nuity and robustness of the value function in concurrent and turn-based stochas- 
CZ2 , tic parity games with respect to imprecision in the transition probabilities. We 

present quantitative bounds on the difference of the value function (in terms of 

the imprecision of the transition probabilities) and show the value continuity for 

£SJ ■ structurally equivalent concurrent games (two games are structurally equivalent 

^- ' if the supports of the transition functions are the same and the probabilities dif- 

0^ , fer). We also show robustness of optimal strategies for structurally equivalent 

^^ ■ turn-based stochastic parity games. Finally, we show that the value continuity 

property breaks without the structural equivalence assumption (even for Markov 
chains) and show that our quantitative bound is asymptotically optimal. Hence 
f***« ■ our results are tight (the assumption is both necessary and sufficient) and optimal 

f_"^ ' (our quantitative bound is asymptotically optimal). 



1 Introduction 



Concurrent stochastic games are played by two players on a finite state space for an 
C^ ■ infinite number of rounds. In every round, the two players simultaneously and inde- 

pendently choose moves (or actions), and the current state and the two chosen moves 
determine a probability distribution over the successor states. The outcome of the game 
(or a play) is an infinite sequence of states. These games were introduced by Shap- 
ley 11241 . and have been one of the most fundamental and well studied game models 
in stochastic graph games. We consider w-regular objectives specified as parity objec- 
tives; that is, given an w-regular set <P of infinite state sequences, player 1 wins if the 
outcome of the game lies in <P. Otherwise, player 2 wins, i.e., the game is zero-sum. The 
class of concurrent stochastic games subsumes many other important classes of games 
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as sub-classes: (1) turn-based stochastic games, where in every round only one player 
chooses moves (i.e., the players make moves in turns); and (2) Markov decision pro- 
cesses (MDPs) (one-player stochastic games). Concurrent games and the sub-classes 
provide a rich framework to model various classes of dynamic reactive systems, and 
w-regular objectives provide a robust specification language to express all commonly 
used properties in verification, and all cj-regular objectives can be expressed as par- 
ity objectives. Thus concurrent games with parity objectives provide the mathematical 
framework to study many important problems in the synthesis and verification of reac- 
tive systems I6I23I21I (see also B1I14I2I ). 

The player- 1 value V\ (s) of the game at a state s is the limit probability with which 
player 1 can ensure that the outcome of the game lies in <P; that is, the value v\(s) is 
the maximal probability with which player 1 can guarantee <P against all strategies of 
player 2. Symmetrically, the player-2 value V2{s) is the limit probability with which 
player 2 can ensure that the outcome of the game lies outside <P. The problem of study- 
ing the computational complexity of MDPs, turn-based stochastic games, and concur- 
rent games with parity objectives has received a lot of attention in literature. Markov 
decision processes with w-regular objectives have been studied in 1 8 9 4 1 and the re- 
sults show existence of pure (deterministic) memoryless (stationary) optimal strategies 
for parity objectives and the problem of value computation is achievable in polynomial 
time. Turn-based stochastic games with the special case of reachability objectives have 
been studied in |7| and existence of pure memoryless optimal strategies has been es- 
tablished and the decision problem of whether the value at a state is at least a given 
rational value lies in NP n coNP. The existence of pure memoryless optimal strate- 
gies for turn-based stochastic games with parity objectives was established in H5I28I , 
and again the decision problem lies in NP n coNP Concurrent parity games have been 
studied in I10I12I3I151 and for concurrent parity games optimal strategies need not ex- 
ist, and e-optimal strategies (for e > 0) require both infinite memory and randomization 
in general, and the decision problem can be solved in PSPACE. 

Almost all results in the literature consider the problem of computing values and 
optimal strategies when the game model is given precisely along with the objective. 
However, it is often unrealistic to know the precise probabilities of transition which 
are only estimated through observation. Since the transition probabilities are not known 
precisely, an extremely important question is how robust is the analysis of concurrent 
games and its sub-classes with parity objectives with respect to small changes in the 
transition probabilities. This question has been largely ignored in the study of con- 
current and turn-based stochastic parity games. In this paper we study the following 
problems related to continuity and robustness of values: (1) (continuity of values): un- 
der what conditions can continuity of the value function be proved for concurrent parity 
games; (2) (robustness of values): can quantitative bounds be obtained on the differ- 
ence of the value function in terms of the difference of the transition probabilities; and 
(3) (robustness of optimal strategies): do optimal strategies of a game remain e-optimal, 
for e > 0, if the transition probabilities are slightly changed. 

Our contributions. Our contributions are as follows: 



1 . We consider structurally equivalent game structures, where the supports of the tran- 
sition probabilities are the same, but the precise transition probabilities may differ. 
We show the following results for structurally equivalent concurrent parity games: 

(a) Quantitative bound. We present a quantitative bound on the difference of the 
value functions of two structurally equivalent game structures in terms of the 
difference of the transition probabilities. We show when the difference in the 
transition probabilities are small, our bound is asymptotically optimal. Our ex- 
ample to show the matching lower bound is on a Markov chain, and thus our 
result shows that the bound for a Markov chain can be generalized to concur- 
rent games. 

(b) Value continuity. We show value continuity for structurally equivalent concur- 
rent parity games, i.e., as the difference in the transition probabilities goes to 0, 
the difference in value functions also goes to 0. We then show that the structural 
equivalence assumption is necessary: we show a family of Markov chains (that 
are not structurally equivalent) where the difference of the transition probabil- 
ities goes to 0, but the difference in the value functions is 1 . It follows that the 
structural equivalence assumption is both necessary (even for Markov chains) 
and sufficient (even for concurrent games). 

It follows from above that our results are both optimal (quantitative bounds) as well 
as tight (assumption both necessary and sufficient). Our result for concurrent par- 
ity games is also a significant quantitative generalization of a result for concurrent 
parity games of [ 10 1 which shows that the set of states with value 1 remains same 
if the games are structurally equivalent. We also argue that the structural equiv- 
alence assumption is not unrealistic in many cases: a reactive system consists of 
many state variables, and given a state (valuation of variables) it is typically known 
which variables are possibly updated, and what is unknown is the precise transition 
probabilities (which are estimated by observation). Thus the system that is obtained 
for analysis is structurally equivalent to the underlying original system and it only 
differs in precise transition probabilities. 

2. For turn-based stochastic parity games the value continuity and the quantitative 
bounds are same as for concurrent games. We also prove a stronger result for struc- 
turally equivalent turn-based stochastic games that shows that along with continuity 
of the value function, there is also robustness property for pure memoryless opti- 
mal strategies. More precisely, for all e > 0, we present a bound j3 > 0, such that 
any pure memoryless optimal strategy in a turn-based stochastic parity game is an 
e-optimal strategy in every structurally equivalent turn-based stochastic game such 
that the transition probabilities differ by at most f3. Our result has deep significance 
as it allows the rich literature of work on turn-based stochastic games to carry over 
robustly for structurally equivalent turn-based stochastic games. As argued before 
the model of turn-based stochastic game obtained to analyze may differ slightly 
in precise transition probabilities, and our results shows that the analysis on the 
slightly imprecise model using the classical results carry over to the underlying 
original system with small error bounds. 

Our results are obtained as follows. The result of [ 1 1'| shows that the value function for 
concurrent parity games can be characterized as the limit of the value function of con- 



current multi-discounted games (concurrent discounted games with different discount 
factors associated with every state). There exists bound on difference on value function 
of discounted games 1 16 1, however, the bound depends on the discount factor, and in the 
limit gives trivial bounds (and in general this approach does not work as value continu- 
ity cannot be proven in general and the structural equivalence assumption is necessary). 
We use a classical result on Markov chains by Friedlin and Wentzell [ 17] and generalize 
a result of Solan [25 1 from Markov chains with single discount to Markov chains with 
multi-discounted objective to obtain a bound that is independent of the discount factor 
for structurally equivalent games. Then the bound also applies when we take the limit 
of the discount factors, and gives us the desired bound. 

Our paper is organized as follows: in Section|2]we present the basic definitions, in 
Section [3] we consider Markov chains with multi-discounted and parity objectives; in 
Section |4l (Subsection 14. 1 1 we prove the results related to turn-based stochastic games 
(item (2) of our contributions) and finally in Subsection l4.2l we present the quantitative 
bound and value continuity for concurrent games along with the two examples to illus- 
trate the asymptotic optimality of the bound and the structural equivalence assumption 
is necessary. Detailed proofs are presented in the appendix. 

2 Definitions 

In this section we define game structures, strategies, objectives, values and present other 
preliminary definitions. 

Probability distributions. For a finite set A, a probability distribution on A is a 
function S : A i— > [0, 1] such that J^aeA ^( a ) = 1- We denote the set of prob- 
ability distributions on A by T)(A). Given a distribution <5 G 13(A), we denote by 
Supp(£) = {x G A | S(x) > 0} the support of the distribution 5. 

Concurrent game structures. A (two-player) concurrent stochastic game structure 
G = (S, A, ri, i~2, 6) consists of the following components. 

- A finite state space S and a finite set A of moves (or actions). 

- Two move assignments F\,r<2 : S i-4 2 A \ 0. For i G {1,2}, assignment JJ as- 
sociates with each state s € S the nonempty set Fi(s) C A of moves available to 
player i at state s, 

- A probabilistic transition function 5:Sxixi4 T^{S), which associates with 
every state s € S and moves oi G A(s) and a-2 G i~2(s) a probability distribution 
d(s, oi, a^) G T>(S) for the successor state. 

Plays. At every state s G S, player 1 chooses a move a\ G A(s), and simultane- 
ously and independently player 2 chooses a move a,2 G i~2 (s). The game then proceeds 
to the successor state t with probability S(s, 01,02) (t), for all t G S, For all states 
s G S and moves a\ G A(s) and 02 G r<z{s), we indicate by Dest(s, 01,02) = 
Supp(£(s, ai, 02)) the set of possible successors of s when moves ai, 02 are selected. 
A path or a play of G is an infinite sequence ui = (so, Si, S2, • • •} of states in S 
such that for all k > 0, there are moves a\ G -Ti(sfc) and a\ G ACsfc) such that 
Sfe+i G Dest(sfc, of, oif). We denote by fl the set of all paths. We denote by 6i the ran- 
dom variable that denotes the i-th state of a path. For a play uj = (sq, Si, s%, . . .) G J7, 



we define Inf (ui) = {s 6 S \ Sk = s for infinitely many k > 0} to be the set of states 
that occur infinitely often in ui. 

Special classes of games. We consider the following special classes of concurrent 

games. 

1 . Turn-based stochastic games. A game structure G is turn-based stochastic if at 
every state at most one player can choose among multiple moves; that is, for every 
state s E S there exists at most one i £ {1, 2} with |-Zl(s)| > 1. 

2. Markov decision processes. A game structure is a player- 1 Markov decision process 
(MDP) if for all s £ S we have \r% (s) | = 1, i.e., only player 1 has choice of actions 
in the game. Similarly, a game structure is a player-2 MDP if for all s £ S we have 

|A(*)| = 1. 

3. Markov chains. A game structure is a Markov chain if for all s £ S we have 
|A(s)| = 1 and |i2(s)| = 1. Hence in a Markov chain the players do not matter, 
and for the rest of the paper a Markov chain consists of a tuple (S, 5) where 5 : 
S M> T>(S) is the probabilistic transition function. 

Strategies. A strategy for a player is a recipe that describes how to extend a play. 
Formally, a strategy for player i £ {1, 2} is a mapping m : S + n- T>(A) that associates 
with every nonempty finite sequence x £ S + of states, representing the past history of 
the game, a probability distribution tt, (a;) used to select the next move. The strategy in 
can prescribe only moves that are available to player i; that is, for all sequences x E S* 
and states s £ S, we require that Supp(7Tj(a; • s)) C .Tj(s), We denote by Hi the set of 
all strategies for player i £ {1,2}. 

Given a state s £ S and two strategies m £ II 1 and W2 £ Hi, we define 
Outcome(s,7ri,7T2) C fi to be the set of paths that can be followed by the game, 
when the game starts from s and the players use the strategies 711 and tt2- Formally, 
(so,si,S2, • ■ •) £ Outcome(s,7ri,7T2) if so = s and if for all k > there ex- 
ist moves a\ £ A(«fc) and a\ £ r 2 (sk) such that (i) tti(sq, . . . , Sk)(a,i) > 0; 
(ii) 7T2(so, . . . , Sfc)(a2*) > 0; and (iii) Sk+i £ Dest(sfc, af , a\). Once the starting state 
s and the strategies m and 112 for the two players have been chosen, the probabilities of 
events are uniquely defined [27], where an event A C Q is a measurable set of pathsj. 
For an event A C i?, we denote by Pr^ 1 ^ 2 (A) the probability that a path belongs to A 
when the game starts from s and the players use the strategies 711 and 7T2- 

Classification of strategies. We consider the following special classes of strategies. 

1 . (Pure). A strategy 7r is pure (deterministic) if for all x £ S + there exists a £ A 
such that n(x)(a) = 1. Thus, deterministic strategies are equivalent to functions 
5+ ^ A. 

2. (Finite -memory). Strategies in general are history-dependent and can be repre- 
sented as follows: let M be a set called memory to remember the history of plays 
(the set M can be infinite in general). A strategy with memory can be described as 
a pair of functions: (a) a memory update function ir u : S x M ^ M, that given 
the memory M with the information about the history and the current state updates 
the memory; and (b) a next move function n n : S x M (->• T>(A) that given the 



1 To be precise, we should define events as measurable sets of paths sharing the same initial 
state, and we should replace our events with families of events, indexed by their initial state. 
However, our (slightly) improper definition leads to more concise notation. 



memory and the current state specifies the next move of the player. A strategy is 
finite -memory if the memory M is finite. 

3. (Memoryless). A memoryless strategy is independent of the history of play and 
only depends on the current state. Formally, for a memoryless strategy 7r we have 
it(x ■ s) — ir(s) for all s € S and all x £ S*. Thus memoryless strategies are 
equivalent to functions S i-4 T>(A). 

4. (Pure memoryless). A strategy is pure memoryless if it is both pure and memoryless. 
Pure memoryless strategies neither use memory, nor use randomization and are 
equivalent to functions S *-t A. 

Qualitative objectives. We specify qualitative objectives for the players by providing 

the set of winning plays & C J? for each player. In this paper we study only zero-sum 

games [22 16 1, where the objectives of the two players are complementary. A general 

class of objectives are the Borel objectives [19|. A Borel objective <P C S u is aBorel set 

in the Cantor topology on S u . In this paper we consider u-regular objectives, which lie 

in the first 2 Y2 levels of the Borel hierarchy (i.e., in the intersection of £3 and 773) [ 26 1 . 

All w-regular objectives can be specified as parity objectives, and hence in this work we 

focus on parity objectives, and they are defined as follows. 

- Parity objectives. For c, d € N, we let [c.d] — {c, c+ 1, . . . , d}. Letp : S h-> [0..d] 

be a function that assigns a priority p(s) to every state s E S, where d € N. The 

Even parity objective requires that the minimum priority visited infinitely often 

is even. Formally, the set of winning plays is defined as Parity(p) = {10 6 Q \ 

min (p(Inf (w))) is even }. 

Quantitative objectives. Quantitative objectives are measurable functions / : i? i->- R. 
We will consider multi-discounted objective functions, as there is a close connection 
established between concurrent games with multi-discounted objectives and concurrent 
games with parity objectives. Given a concurrent game structure with state space S, 
let A be a discount vector that assigns for all s 6 S a discount factor < A(s) < 1 
(unless otherwise mentioned we will always consider discount vectors A such that for 
all s G S we have < A(s) < 1). Let r : S M> R be a reward function that assigns a 
real-valued reward r(s) to every state s £ S. The multi-discounted objective function 
MDT(A, r) : fi M> R maps every path to the mean-discounted reward of the path. 
Formally, the function is defined as follows: for a path u = S0S1S2 ... we have 

Mrvivv u v EJlo(IIi=o A (s-0)-Ksj) 
MDT(A, r)(u) = 



Also note that a parity objective <P can be intepreted as a function <P : Q i-> {0, 1} by 
simply considering the characteristic function that assigns 1 to paths that belong to ^ 
and otherwise. 

Values, optimality, e-optimality. Given an objective <P which is a measurable func- 
tion <P : Q H» R, we define the value for player 1 of game G with objective <P 
from the state s £ S as Val(G,<P)(s) = sup WieIfi inU 2 en 2 ^ 1 ' 7T2 (<P); i.e., the 
value is the maximal expectation with which player 1 can guarantee the satisfaction 
of <P against all player 2 strategies. Given a player-1 strategy m, we use the nota- 
tion Var i (G,^)(s) = inf Tr2e 77 2 E^ 1 ' 7r2 (^). A strategy wi for player 1 is optimal 



for an objective <P if for all states s G S, we have Var i (G,<?)(s) = Val(G,<?)(s). 
For e > 0, a strategy 7Ti for player 1 is e-optimal if for all states s £ S, we have 
Val' ri (G, $)(s) > Val(G,<£)(s) — e. The notion of values, optimal and e-optimal strate- 
gies for player 2 are defined analogously. The following theorem summarizes the results 
in literature related to determinacy and memory complexity of concurrent games and 
its sub-classes for parity and multi-discounted objectives. 

Theorem 1. The following assertions hold: 

1. (Determinacy [2010 . For all concurrent game structures and for all parity 
and multi-discounted objectives <P we have swp Trie]Ji inf OT2£ jj 2 EJ 1,7r2 (^>) = 
mii T2en . 2 sup 7rieni E^(<P). 

2. (Memory complexity). For all concurrent game structures and for all multi- 
discounted objectives <P, randomized memoryless optimal strategies exist [24]. For 
all turn-based stochastic game structures and for all multi-discounted objectives <P, 
pure memoryless optimal strategies exist [16]. For all turn-based stochastic game 
strucutures and for all parity objectives <P, pure memoryless optimal strategies ex- 
ist 451281/ . In general optimal strategies need not exist in concurrent games with 
parity objectives, and e-optimal strategies, for e > 0, need both randomization and 
infinite memory in general A70I/ . 



The results of ITTI established that the value of concurrent games with certain spe- 
cial multi-discounted objectives can be characterized as valuations of quantitaive dis- 
counted /i-calculus formula. In the limit, the value function of the discounted /i-calculus 
formula characterizes the value function of concurrent games with parity objectives. An 
elegant interpretation of the result was given in ifTHl , and from the interpretation we ob- 
tain the following theorem. 

Theorem 2 ( 11 11181 ). Let G be a concurrent game structure with a parity objective <P 
defined by a priority function p. Let r be a reward function that assigns reward 1 to even 
priority states and reward to odd priority states. Then there exists an order s\S2 ■ ■ ■ s n 
on the states (where S — {si, S2, ■ ■ ■ , s n }) dependent only on the priority function 
p such that Val(G,#) = lim A(si) _ >1 lim A(s2) _ >1 . . .lim A(Sn) ^ 1 Val(G, MDT(A,r)); 
in other words, if we consider the value function Val(G, MDT(A,r)) with the multi- 
discounted objective and take the limit of the discount factors to 1 in the order of the 
states we obtain the value function for the parity objective. 

We now present notions related to structure equivalent game structures and dis- 
tances. 

Structure equivalent game structures. Given two game structures G\ = 
(S,A,ri,r2,8i) and G2 = (S, A, -Ti,^,^} on the same state and action space, 
with different transition function, we say that G\ and G2 are structure equivalent 
(denoted G\ = G2) if for all s e S and all 04 £ A(s) and 122 € ^(s) we 
have Supp(<5i(s, 01, 02)) = Supp(<52(s, 01, 02))- Similarly, two Markov chains G\ = 
(S, Si) and G2 = (S, 82) are structurally equivalent (denoted Gi = G2) if for all s £ S 
we have Supp(<5i(s)) = Supp(#2(s)). For a game structure G (resp. Markov chain 
G), we denote by [G]= the set of all game structures (resp. Markov chains) that are 
structurally equivalent to G. 



Ratio and absolute distances. Given two game structures G± = (S,A,ri,r2,8i) 
and G2 = (S, A, ^,^,82), the absolute distance of the game structures is maxi- 
mum absolute difference in the transition probabilities. Formally, distA(Gi,G2) = 
va.axs,teS,aer 1 (B),bQr 2 ( 8 )\Si{s,a,b)(t) - 5 2 (s,a,b)(t)\. The absolute distance for 
two Markov chains G\ = {S,8\) and G2 — (8,62) is disi^Gi, G2) = 
max s4G 5 \Si(s)(t) — 82(3) (t)\. We now define the ratio distance between two struc- 
turally equivalent game structures and Markov chains. Let G\ and G 2 be two struc- 
turally equivalent game structures. The ratio distance is defined on the ratio of the tran- 
sition probabilities. Formally, 

j- j. m n \ J 8i(s,a,b)(t) 8 2 (s,a,b)(t) 

dist R (GuG 2 )= max i —j tttx^yI rrTTT * e S,a G A(s),6 G r 2 (s), 

[8 2 {s,a,b){t) 6i{s,a,b)[t) 

t 6 Sxxpp(8i(s,a,b)) = Supp(6 2 (s,a, b)) > — 1 

The ratio distance between two structurally equivalent Markov chains G\ and G2 is 

max{|iQg,|fgg I seS,te Supp(*i(*)) = Supp(J 2 ( S ))} - 1. 

Remarks about the distance functions. We first remark that the ratio distance is 
not necessarily a metric. Consider the Markov chain with state space S = {s, s'} 
and let e G (0, 1/7). For k = 1,2,5 consider the transition functions 8k such that 
5k(t)(s) = 1 - 8k(t)(s') = k ■ e, for all t G S. Let Gk be the Markov chain with 
transition function 5k- Then we have distft(Gi,G2) = 1, distn(G2, G5) = f and 
distn{G\,G$) — 4, and hence distn(Gi,G2) + distR(G2,G$) < distn{Gx,Gs). 
The above example is from [25 1 . Also note that dista is only defined for structurally 
equivalent game structures, and without the assumption distn is 00. We also remark 
that the absolute distance that measures the difference in the transition probabilities is 
the most intuitive measure for the difference of two game structures. 

Proposition 1. Let G\ be a game structure (resp. Markov chain) such that the minimum 
positive transition probability is r\ > 0. For all game structures (resp. Markov chains) 
G 2 G [Gi]= we have dist R (G 1 ,G 2 ) < &U( ^ G2) . 

Notation for fixing strategies. Given a concurrent game structure G = 
(S, A, Pi, I2, 8), let 7i"i be a randomized memoryless strategy. Fixing the strategy -k\ 
in G we obtain a player-2 MDP, denoted as G \ 7i"i, defined as follows: (1) the state 
space is S; (2) for all s G S we have A(s) = {-L} (hence it is a player-2 MDP); (3) 
the new transition function 8- !ri is defined as follows: for all s G S and all b G ^2(5) 
we have (5 ffl (s,_L,6)(t) = Ylaer (s) 7r i( s )( a ) ' 8(s,a,b)(t). Similarly if we fix a ran- 
domized memoryless strategy tk\ in an MDP G we obtain a Markov chain, denoted as 
G \ 7ri . The following proposition is straightforward to verify from the definitions. 

Proposition 2. Let G\ and G2 be two concurrent game structures (resp. MDPs) 
that are structurally equivalent. Let 7Ti be a randomized memoryless strategy. Then 
distA{G\ \ 7ri,Cr2 X ^i) < distA(Gi,G2) and distii{G\ X 7r i,G , 2 X ^i) < 
dist R (G!,G 2 ). 



3 Markov Chains with Multi-discounted and Parity Objectives 

In this section we consider Markov chains with multi-discounted and parity objectives. 
We present a bound on the difference of value functions of two structurally equivalent 
Markov chains that is dependent on the distance between the Markov chains and is in- 
dependent of the discount factors. The result for parity objectives is then a consequence 
of our result for multi-discounted objectives and Theorem [2] Our result crucially de- 
pends on a result of Friedlin and Wentzell for Markov chains and we present this result 
below, and then use it to obtain the main result of the section. 

Result of Friedlin and Wentzell. Let (S, 6) be a Markov chain and let s be the initial 
state. Let C C S be a proper subset of S and let us denote by exc = inf {n G N 
On $ C} the first hitting time to the set S \ C of states (or the first exit time from 
set C) (recall that n is the random variable to denote the n-th state of a path). Let 
^{C, S) — {/ : C H> S} denote the set of all functions from C to S. For every 
/ G T(C, S) we define a directed graph Gf = (S, Ef) where (s, t) € Ef iff /(a) = t. 
Let a/ = 1 if the directed graph Gf has no directed cycles (i.e., Gf is a directed acyclic 
graph); and a>f — otherwise. Observe that since / is a function, for every s 6 C there 
is exactly one path that starts at s. For every a € C and every t 6 S, let /3/(s, t ) = 1 if 
the directed path that leaves s in Gf reaches t, otherwise /?/ (a, t) = 0. We now state a 
result that can be obtained as a special case of the result from Friedlin and Wentzell [ 17 1. 
Below we use the formulation of the result as presented in [25 1 (Lemma 2 of l25l ). 

Theorem 3 (Friedlin- Wentzell result [17j). Let (5, 5) be a Markov chain, and let C C 
S be a proper subset of S such that Pr s (exc < 00 ) > for every s G C (i.e., from 
all s S C with positive probability the first hitting time to the complement set is finite). 
Then for every initial state si € C and for every t ^ C we have 

Pr [e _ t) _ J2fencs)(M^t)-U s ec^)U( S ))) (1) 

E/g^(c,s)(«/ • Usec S ( s )(f( s ))) 

in other words, the probability that the exit state is t when the starting state is S\ is 
given by the expression on the right hand side (very informally the right hand side is 
the normalized polynomial expression for exit probabilities). 

Value function difference for Markov chains. We will use the result of Theorem [3] 
to obtain bounds on the value functions of Markov chains. We start with the notion of 
mean-discounted time. 

Mean-discounted time. Given a Markov chain (S, 5) and a discount vector A, we de- 
fine for every state a G S, the mean-discounted time the process is in the state s. We 
first define the mean-discounted time function MDT(A, s) : Q H» K that maps every 
path to the mean-discounted time that the state s is visited, and the function is formally 
defined as follows: for a path u = S0S1S2 ... we have 

MDT(A lS )M = E ' :o(nLoA(S4)) ' 1 ^ = -" 



E~o(rK=oA(^)) 



where l s !=s is the indicator function. The expected mean-discounted time function for 
a Markov chain G with transition function 6 is defined as follows: MT(si, s, G, A) = 
E Sl [MDT(A, s)], i.e., it is the expected mean-discounted time for s when the starting 
state is s\, where the expectation measure is defined by the Markov chain with transition 
function S. We now present a lemma that shows the value function for multi-discounted 
Markov chains can be expressed as ratio of two polynomials (the result is obtained as a 
simple extension of a result of Solan [25]). 

Lemma 1. For Markov chains defined on state space S, for all initial states s$, for all 
states s, for all discount vectors X, there exists two polynomials gi(-) and .92 (•) in \S\ 2 
variables Xt t t', where t,t' £ S such that the following conditions hold: 

1. the polynomials have degree at most \S\ with non-negative coefficients; and 

2. for all transition functions 8 over S we have MT(so, s, G, A) = gl U , where G = 

(S, 6), g\ ($) and 52 ($) denote the values of the function g\ and 32 such that all the 
variables Xt.t' is instantiated with values S(t)(t') as given by the transition function 
S. 

Proof. (Sketch). We present a sketch of the proof (details in appendix). Fix a discount 
vector A. We construct a Markov chain G = (S, 5) as follows: S = S U Si, where 5i 
is a copy of states of S (and for a state a £ S we denote its corresponding copy as si); 
and the transition function S is defined below 

1. (5(si)(si) = 1 for all si £ Si (i.e., all copy states are absorbing); 

2. for s £ S we have 





[(l-A(s)) t = *i; 


5(s)(t) = 


I X(s) ■ S(s)(t) t£S; 




[o t£Si\si 



i.e., it goes to the copy with probability (1 — A(s)), it follows the transition S in the 
original copy with probabilities multiplied by A(s). 



We first show that for all so and s we have MT(so, s, G, A) = Pr Sg (8 eXs = Si); 
i.e., the expected mean-discounted time in s when the original Markov chain starts 
in s is me probability in the Markov chain (5, 5) that the first hitting state out of S is 
the copy si of the state s. The claim is easy to verify as both (MT(so, s, G, \)) So eS 
and (Pr so (# e x s = si))s eS are the unique solution of the following system of linear 
equations: for all t £ S we have y t = (1 — A(t)) • l t=s + >~2 ZI£S X(t) ■ 5(t)(z) ■ y z . 

We now claim that Pr So (exs < 00) > for all so £ S. This follows since for all 

seSwe have S(s)(si) = (1 — A(s)) > and since si ^ S we have Pr So (ex5 = 2) = 
(1 — A(so)) > 0. Now we observe that we can apply Theorem [3] on the Markov chain 
G = (S, S) with S as the set C of states of Theorem [3] and obtain the result. Indeed 
the terms a/ and /3/(s, t) are independent of S, and the two products of Equation dU 
each contains at most |5| terms of the form 6(s)(t) for s,t £ S. Thus the desired result 
follows. I 



Lemma 2. Let h(xi,X2, ...,Xk) be a polynomial function with non-negative coeffi- 
cients of degree at most n. Let e > and y, y' G M. k be two non-negative vec- 
tors such that for all i — 1,2, ... ,k we have yr^ < H < 1 + e. Then we have 

(l + e)- n <^<(l + ey\ 

Lemma 3. Let G\ — (S, 8) and G 2 = (S, 8') be two structurally equivalent 
Markov chains. For all non-negative reward functions r : S h-> R such that the re- 
ward function is bounded by 1, for all discount vectors A, for all s £ S we have 
|Val(Gi, MDT(A, r))(s) - Val(G 2 , MDT(A,r))(s)| < (1 + dist R {Gi, G 2 )) 2 ' |s| - 1; 
i.e., the absolute difference of the value functions for the multi-discounted objective is 
bounded by (1 + dist R (Gi, G 2 )) 2 ' 151 - 1. 

The proof of Lemma [3] uses Lemma[T]and Lemma[2]and is presented in the appendix. 

Theorem 4. Let G\ — (S, 8) and G 2 = (S, 8') be two structurally equivalent Markov 
chains. Let r\ be the minimum positive transition probability in G\. The following as- 
sertions hold: 

1. For all non-negative reward functions r : S H- M. such that the reward function is 
bounded by 1, for all discount vectors A, for all s £ S we have 

|Val(Gi,MDT(A,r))(s)-Val(G 2 ,MDT(A,r))(s)| < (1 + e R ) 2 ^ -1 

<(l + ^) 2 -l s l-l 

2. For all parity objectives <P and for all s £ S we have 

|Val(Gi,<P)(s) - Val(G 2 , *)(«)| < (1 + e^) 2 '^ - 1 < (1 + s A ) 2 ^ - 1 

where e R = dist R {G u G 2 ) ande A = ^ u(Gl ' G2) . 

Proof. The first part follows from Lemma [3] and Proposition Q] The second part fol- 
lows from part 1, the fact the value function for parity objectives is obtained as the 
limit of multi-discounted objectives (Theorem |2), and the fact the bound for part 1 is 
independent of the discount factors (hence independent of taking the limit). I 

Remark on structural assumption in the proof. The result of the previous theorem 
depends on the structural equivalence assumption in two crucial ways. They are as 
follows: (1) Proposition Q] that establishes the relation of dist R and dist A only holds 
with the assumption of structural equivalence; and (2) without the structural equivalence 
assumption dist R is 00, and hence without the assumption the bound of the previous 
theorem is 00, which is a trivial bound. We will later show (in Example Q} that the 
structural equivalence assumption is necessary. 

4 Value Continuity for Parity Objectives 

In this section we show two results: first we show robustness of strategies and present 
quantitative bounds on value functions for turn-based stochastic games and then we 
show continuity for concurrent parity games. 



4.1 Bounds for structurally equivalent turn-based stochastic parity games 

In this section we present quantitative bounds for robustness of optimal strategies in 
structurally equivalent turn-based stochastic games. For every e > 0, we present a 
bound j3 > 0, such that if the distance of the structurally equivalent turn-based stochas- 
tic games differs by at most j3, then any pure memoryless optimal strategy in one game 
is e-optimal in the other. The result is first shown for MDPs and then extended to turn- 
based stochastic games (both proofs are in the appendix). 

Theorem 5. Let G\ be a turn-based stochastic game such that the minimum positive 
transition probability is r\ > 0. The following assertions hold: 

1. For all turn-based stochastic games G2 6 \G\\=, for all parity objectives <& and 
for all s € S we have 

|Val(Gi,*)(«) - Val(G 2 ,*)(*)| < (1 + dist R {G u G 2 )) 2 ^ - 1 

< / | dist A (G 1 ,G 2 ) \ 2 - lSl 1 

2. For e > 0, let j3 < § • ((1 + |)^fsT - 1). For all G 2 e [Gi] = such that 
distA(Gi, G 2 ) < P, for all parity objectives <I>, every pure memoryless optimal 
strategy ix\ in G\ is an e-optimal strategy in G 2 . 



4.2 Value continuity for concurrent parity games 

In this section we show value continuity for structurally equivalent concurrent parity 
games, and show with an example on Markov chains that the continuity property breaks 
without the structural equivalence assumption. Finally with an example on Markov 
chains we show the our quantitative bounds are asymptotically optimal for small dis- 
tance values. We start with a lemma for MDPs. 

Lemma 4. Let G\ and G 2 be two structurally equivalent MDPs. Let r\ be the minimum 
positive transition probability in G\. For all non-negative reward functions r : S 1— > R 
such that the reward function is bounded by 1, for all discount vectors A, for all s G S 
we have 

|Val(Gi, MDT(A,r))(«) - Val(G 2 , MDT(A,r))(s)| < (1 + dist R {G 1 ,G 2 ))' 2 -\ s \ - 1 

2-|S| 

V J 

The main idea of the proof of the above lemma is to fix a pure memoryless optimal 
strategy and then use the results for Markov chains. Using the same proof idea, along 
with randomized memoryless optimal strategies for concurrent game structures and the 
above lemma, we obtain the following lemma (the result is identical to the previous 
lemma, but for concurrent game structures instead of MDPs). 



1 tfet A (Gi,G 2 ) s _ ! 



Lemma 5. Let G\ and G 2 be two structurally equivalent concurrent game structures. 
Let r\ be the minimum positive transition probability in G\. For all non-negative reward 
functions r : S i— > M such that the reward function is bounded by 1, for all discount 
vectors A, for all s G S we have 

|Val(Gi,MDT(A,r))(s)-Val(G 2 ,MDT(A,r))(s)| < (1 + dist R {G ll G 2 )) 2 ^ - 1 

i+ distA(Gi 1 G2)^ 2 ' lSl 



V 
We now present the main theorem that depends on Lemma[5] 

Theorem 6. Let G\ and G 2 be two structurally equivalent concurrent game structures. 
Let r\ be the minimum positive transition probability in G\. For all parity objectives <P 
and for all s G S we have 

|Val(Gi,#)(s) -Val(G 2 ,*)(*)| < (1 + dist R [G u G 2 )) 2 ^ - 1 

; | fe^(G 1; G 2 ) \ 2 - |s| i 



Proof The result follows from Theorem [2] Lemma [5] and the fact that the bound of 
Lemma [3] are independent of the discount factors and hence independent of taking the 
limits. I 

In the following theorem we show that for structurally equivalent game structures, 
for all parity objectives, the value function is continuous in the absolute distance be- 
tween the game structures. We have already remarked (after TheoremHJi that the struc- 
tural equivalence assumption is required in our proofs, and we show in Example [T] that 
this assumption is necessary. 

Theorem 7. For all concurrent game structures G\, for all parity objectives <fr 

lim sup sup|Val(Gi,<2>)(s)-Val(G 2 ,<£)(s)| = 0. 

e ^° G 2 elGi}=,distA(Gi,G 2 )<e seS 

Proof Let r\ > be the minimum positive transition probability in G\. By Theorem|6] 
we have 

- ,2-|S| 

lim sup sup|Val(Gi,*)(s)-Val(G 2 ,*)(s)| < lim ( 1- 

£ ^° G 2 6[Gi]=,dJsU(Gi,G 2 )<E seS e ^° \ V , 

The above limit equals to 0, and the desired result follows. I 

Example 1 (Structurally equivalence assumption necessary). In this example we show 
that in Theorem[7]the structural equivalence assumption is necessary, and thereby show 
that the result is tight. We show an Markov chain G\ and a family of Markov chains 
G|, for e > 0, such that d«sf^(Gi,G 2 ) < e (but G\ is not structurally equiva- 
lent to G|) with a parity objective <P and we have lim^o sup seS |Val(Gi,#)(s) — 



Val(G|,^)(s) = 1. The Markov chains G\ and G\ are defined over the state space 
{so,si}, and in G\ both states have self-loops with probability 1, and in G\ the 
self-loop at so has probability 1 — e and the transition probability from so to s\ 
is e (see Fig [3] in appendix). Clearly, iisbjiSG\,Gfj) = e. The parity objective <P 
requires to visit the state s\ infinitely often (i.e., assign priority 2 to s\ and pri- 
ority 1 to so)- Then we have Val(Gi,^)(so) = as the state so is never left, 
whereas in G\ the state si is the only closed recurrent set of the Markov chain and 
hence reached with probability 1 from sq. Hence Val(G2,^)(so) = 1. It follows that 
Iim e _«sup aeS |Val(Gi > *)(*)-Val(G|,*)(»)| = l.I 

Example 2 (Asymptotically tight bound for small distances). We now show that our 
quantitative bound for the value function difference is asymptotically optimal for small 
distances. Let us denote the absolute distance as e, and the quantitative bound we ob- 
tain in Theorem [6] is (1 + ^) 2 'l s — 1, and if e is small, then we obtain the following 
approximate bound 



\ 2-|S| 

1 + -) -1« 1 + 2- LSI- --1 = 2- LSI--. 
1)) V V 

We now illustrate with an example (on structurally equivalent Markov chains) where 
the difference in the value function is 0(\S\ ■ e), for small e. Consider the Markov chain 
defined on state space S — {sq, si, . . . , S2 n -i> S2n} as follows: states so and s^n are 
absorbing (states with self-loops of probability 1) and for a state 1 < i < 2n — 1 we 
have 5(si)(si-i) — \ + e; and S(si)(si + i) = ^ — e; i.e., we have a Markov chain 
defined on a line from to 2n (with and 2n absorbing states) and the chain moves 
towards with probability \+ e and towards 2n with probability | — e (see Fig |4] with 
complete details in appendix). Our goal is to estimate the probability to reach the state 
so, and let Vi denote the probability to reach so from the starting state s,. We show 
(details in appendix) that if e — 0, then v n = \ and for < e < |, such that e is close 
to 0, we have v n = \ + n • e. Observe that the Markov chain obtained for e = and 
| > e > are structurally equivalent. Thus the desired result follows. I 



5 Conclusion 



In this work we studied the robustness and continuity property of concurrent and turn- 
based stochastic parity games with respect to small imprecision in the transition prob- 
abilities. We presented (i) quantitative bounds on difference of the value functions and 
proved value continuity for concurrent parity games under the structural equivalence 
assumption, and (ii) showed robustness of all pure memoryless optimal strategies for 
structurally equivalent turn-based stochastic parity games. We also showed that the 
structural equivalence assumption is necessary and that our quantitative bounds are 
asymptotically optimal for small imprecision. We believe our results will find appli- 
cations in robustness analysis of various other classes of stochastic games. 
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Appendix 

6 Missing proofs of Section 2 

Proof, (of Proposition^. Consider s G S, a G Z\ (s) , 6 G ^(s), and £ G 
Supp(5i(s,a, b)) = S\ipp(S2(s,a,b)). Then we have the following two inequalities: 
we consider ^'r^ I and the argument for /[g'°' fe w t s is symmetric. We consider 

^[s^bjjt) and if S 2 (s, a, b)(t) < 5i(s, a, b)(t), then f^'^] < 1, and otherwise we 
have the following inequality: 

&(*,<*, 6)(t) < S^a^jt) + dist A (G 1 ,G 2 ) = l | fe^Cgi^gg) < t fe^(Gi,G 2 ) 



<5i(s,a,6)(t) " <5i(s,a,b)(t) «5i(s,a,&)(t) 

It follows that in both cases we have /, a \wJ — 1 < - A 2 ' — , The desired result 

oi[s.a.b)[t) — rj 

follows from the above inequalities. I 



7 Missing proofs of Section 3 

We now present the proof of Lemma Q] which is obtained as a simple extension of a 
result of Solan 



Proof. (ofLemma\T}. Fix a discount vector A. We construct a Markov chain G = (S, 8) 
as follows: S = S U S\, where Si is a copy of states of S (and for a state s G 5 we 
denote its corresponding copy as si); and the transition function <5 is defined below 

1. <5(si)(si) = 1 for all Si G Si (i.e., all copy states are absorbing); 

2. for s G 5 we have 

((l-A(s)) t = si; 

[o t€S 1 \s 1 ; 

i.e., it goes to the copy with probability (1 — A(s)), it follows the transition S in the 
original copy with probabilities multiplied by A(s). 

We first show that for all s and s we have 

MT( S o,s,G,A)=Prf (^ex s = s 1 ); 

i.e., the expected mean-discounted time in s when the original Markov chain starts in 
So is the probability in the Markov chain (S, S) that the first hitting state out of S is the 
copy si of the state s. The claim is easy to verify as both (MT(so, s, G, X)) So &S and 
(Pr SQ (^exs = si))s es are the solutions of the following system of linear equations 

yt = (1 - A(t)) • l t=s + Y, Ht) ■ S(t)(z) -y z We S. 

zes 



The fact that (MT(so, s, G, A)) Soe s is the solution of the above equation follows from 
the results of discounted reward Markov chains (detailed proofs with uniform discount 
factor for MDPs is available in 1 16| (e.g., equation 2. 15 of [ 16|), and specialization to 
Markov chains and generalization to discount factor attached to every state is straight- 
forward). The fact that (Prf {0 exs = si)) Sg£ s is the solution of the above equation 
follows from the results of characterization of hitting time for transient Markov chains 
(see [ 13 1 for details). Also the above system of linear equations has a unique solution. 
The uniqueness of the solution follows from the fact that this is a contraction mapping, 
and the proof is as follows: let (yl) z es an d (yf)zes be two solutions of the system. We 
chose z* G S such that z* — argmax 2S s \yl — y\\, i- e -' z * is a state that maximizes 
the difference of the two solutions. Let r\ — \y\-, — y 1 * |. As y 1 and y 2 are solutions of 
the above system we have by the triangle inequality 

o<v = \yl*-yU<J2 x ( t y\y 1 t-y?\ 

tes 

tes tes 

Since X^tes $( s o)(t) = 1> it follows that rj < ry ■ max te s X(t). Since max tG s A(t) < 1 
it follows that we must have r) — and hence the two solutions must coincide. 

We now claim that Pr SQ (exs < oa) > for all so £ S. This follows since for all 

seSwe have <5(s)(s!) = (1 — A(s)) > and since Si ^ S we have Prf (ex s = 2) = 
(1 — A(so)) > 0. Now we observe that we can apply Theorem [3] on the Markov chain 
G = (S, S) with S as the set C of states of Theorem [3] and obtain the result. Indeed 
the terms a/ and /3/(s, t) are independent of S, and the two products of Equation (Q]i 
each contains at most \S\ terms of the form 6(s)(t) for s,t € S. Thus the desired result 
follows. I 

Example 3 (Illustration of construction of Lemma\J}. We now illustrate the construc- 
tion of Lemma Q] with the aid of some examples. Consider the Markov chain G with 
states s and t such that t is absorbing and the transition from s to t has probability 1, 
and let the discount factor be 1/3 for all states. The Markov chain G along with G is 
shown in Fig.Q] If we start at s, the mean-discounted time at t is given by 

l/3 2 + l/3 3 + ... 1/9-3/2 1 



1/3 + l/3 2 + 1/3 3 + .. . 1/3-3/2 3' 

In the Markov chain G, the probability to reach t from s is 1/3, and once t is reached 
the exit state is t\ with probability 1. Hence the probability to exit through state t\ is 
also 1/3. 

We now consider another example to illustrate further. Consider the Markov chain 
G and G in Fig [2] where in G it alternates between state s and t, and the discount factor 
is 1/3. If we start at state s, the mean-discounted time at t is given by 

1/3 2 + 1/3 4 + 1/3 6 + . . . _ 1/9 • 9/8 _ 1 

l/3 + l/3 2 + l/3 3 + ... ~ 1/3-3/2 ~ 4' 



• i — I — H t 



1/3 



2/3 

■C£> 

Fig. 1. Markov chains G and G. 

1 1/3 






2/3 



1/3 2/3 



CB 



Fig. 2. Markov chains G and G. 



The probability to exit through t\ in G in 2-steps is 1/3 • 2/3, in 4-steps is 1/3 3 • 2/3 
and so on. Hence the probability to exit through t\ in G is 

2/3 • (1/3 + 1/3 3 + 1/3 5 + ...) = 2/3 • 1/3 • 9/8 = 1/4. 

The above examples show how the mean-discounted time in G and the exit probability 
in G has the same value. I 

Proof, (of Lemma\2§. We first write h(x) as follows: 

£ rii 

h(x) = ^a t ■ Jjzfty, 
i=i j=i 

where leN, for all * = 1, 2, . . . , £ we have a, > 0, rii < n, and 1 < fc^- < k for each 
j = 1, 2, . . . , m. By the hypothesis of the lemma, for all i = 1, 2, . . . , £ we have 



1 



(1 + eY 



n^n^^+^-ru 



Vk t 



J=l 



J=l 



i=i 



Since every aj > 0, multiplying the above inequalities by a,; and summing over i = 
1,2, ... ,£ yields the desired result. I 

Proof, (of Lemma EJ, We first observe that for a Markov chain G we have 
Val(G, MDT(A, r)){s) = EteS r (*) ' MT{s,t,G,X), i.e., the value function for 
a state s is obtained as the sum of the product of mean-discounted time of states 
and the rewards with s as the starting state. Hence by Lemma |2] it follows that 



Val(G, MDT(A, r))(s) can be expressed as a ratio ^44 of two polynomials of degree 
at most \S\ over \S\ 2 variables. Hence we have 

Val(Gi,MDT(A,r))(s) 9l (S) g 2 (S>) 



Val(G 2 ,MDT(A,r))( s ) 9l (8>) g 2 (S) 

Let e = distn{G\, G 2 ). By definition for all si, s 2 G S 1 , if s 2 G Supp(<5(si)), then we 

have both ,,y 1 \y 2 { and x , V, -, are between -^j— and 1 + e. It follows from Lemmal2l 
o'(si)(s2) d(si)(s2) l+e ■— * 

with k = \S\ 2 that 

{1+e) -\S\<im < (1 + £) |5| foH G{l,2}. 

Thus we have 

(1 + £r ,, S ,<^.^)< (1 + £) ,,s, 
ffi(o) 92(d) 
Hence we have 

|s| Val(G 1 ,MDT(A,r))( a ) , |5| 

1 j " Val(G 2 ,MDT(A,r))(s) " l ' 

We consider the case when Val(Gi, MDT(A, r))(s) > Val(G 2 , MDT(A, r))(s), and 
the other case argument is symmetric. We also assume without loss of general- 
ity that Val(G 2 , MDT(A, r)){s) > 0. Otherwise if Val(G 2 , MDT(A, r))(s) = 0, 
since rewards are non-negative, it follows that no state with positive reward is 
reachable from s both in G\ and G 2 (because if they are reachable, then they 
are reachable with positive probability and then the value is positive), and hence 
Val(Gi,MDT(A,r)) = Val(G 2 , MDT(A,r)) = and the result of the lemma follows 
trivially. Since we assume that Val(Gi,MDT(A,r))(s) > Val(G 2 , MDT(A, r))(s) and 
Val(G 2 , MDT(A, r))(s) > 0, we have 

|Val(Gi,MDT(A,r))(s) - Val(G 2 , MDT(A,r))(s)| 

< Val(G 2 , MDT(A, r))(s) • ((1 + e) 2 l 5 l - l) 

Since the reward function is bounded by 1, it follows that Va I (G 2 , MDT(A, r))(s) < 1, 
and hence we have 

|Val(Gi, MDT(A, r))(s) - Val(G 2 , MDT(A, r))(s)| < (1 + dist R {G x , G 2 )) 2 * |S| - 1. 

The desired result follows. I 

8 Missing proofs of Section 4 

8.1 Details of Subsection 4.1 

We first show the desired result for MDPs and then extend to turn-based stochastic 
games. 



Theorem 8. Let G\ be a player- 1 MDP such that the minimum positive transition prob- 
ability is r\ > 0. The following assertions hold: 

1. For all player- 1 MDP s G2 6 [Gi]=, for all parity objectives <P and for all s G S 
we have 

|Val(Gi,*)(«) - Val(G 2 ,<P)(a)| < (1 + dist R {G x ,G 2 )) % \ s \ - 1 



2. For £ > 0, let /3 < \ ■ ((1 + f) 5 ^ - 1). For all G 2 E [Gi] = rac/i f/iaf 
distA{G\, G2) < /3, /or a/Z parity objectives <P, every pure memoryless optimal 
strategy 7Ti in G\ is an e-optimal strategy in G 2 . In other words, for the interval 
[0, j3), every pure memoryless optimal strategy in G\ is an e-optimal strategy in all 
structurally equivalent MDP s ofG\ such that the distance lies in the interval [0, /3). 

Proof. We prove the two parts below. 

1. Without loss of generality, let Val(Gi,<P)(s) > Val(G 2 ,^)(s). Let m be a pure 
memoryless optimal strategy in G\ and such a strategy exists by TheoremQ] Then 
we have the following inequality 

Val(G 2 ,<?)( S )>Val(G 2 \-k^$){s) 

> Val(Gi t TTi ,#)(«) - ((1 + dist R {G 1 ,G 2 )) % \ s \ - 1) 

= Val(Gi,*)(«) - ((1 + dist R {G u G 2 )) % \ s \ - 1) 

The (in)equalities are obtained: the first inequality follows because the value in 
G 2 is at least the value in G 2 obtained by fixing a particular strategy (in this case 
7Ti); the second inequality is obtained by appying Theorem [4] on the structurally 
equivalent Markov chains G\ \ m and G 2 f ""l! and the final equality follows 
since m is an optimal strategy in G\. The desired result follows. 

2. Let G 2 6 [Gi]= such that distA{G\, G 2 ) < (3. Let m be any pure memoryless 
optimal strategy in G\. Then we have the following inequality 

Val(G 2 r *!,*)(*) >Val(Gi f wi,*)(s) - ((1 + dt S t R (G 1 ,G 2 )) 2 ^ - l) 
= Val(Gi,£)0) - ((1 + dist R (G u G 2 )) 2 -W - 1) 
> Val(G 2 ,*)(«) - 2 ■ ((1 + dist R (G u G 2 )) 2 -W - l). 

The first inequality is a consequence of Theorem|4]applied on Markov chains G 2 \ 
7Ti and Gi |* 7Ti; the equality follows from the fact 7Ti is an optimal strategy in Gi; 
and the infinal equality follows by applying the result of part 1. Hence to prove that 
7Ti is e-optimal in G 2 we need to show that 

2-((l + dist R (G 1 ,G 2 )) 2 -W-l) <e (2) 

We have 

(l + dist R ( Gl ,G 2 ))<(l + diStA ^ G2) 



the inequality follows from PropositionQ] Hence to prove inequality (f2| it suffices 
to show that 

Since /3<f-((l + |) 2 i s i-l),we obtain the desired inequality. 

The desired result follows. I 

Proof, (of Theorem^. The proof is essentially to repeat the proof of Theorem [8] as 
in MDPs pure memoryless optimal strategies exist in turn-based stochastic games with 
parity objectives (Theorem [l); and once a pure memoryless strategy is fixed in a turn- 
based stochastic game we obtain an MDP. Since Theorem [8] extend the result of Theo- 
rem 2] from Markov chains to MDPs, the proof for the desired result follows by mim- 
icking the proof of Theorem|8]and instead of using the result of Theorem|4]for Markov 
chains using the result of Theorem|8]for MDPs. I 

8.2 Details of Subsection 4.2 

Proof (of Lemma 0. The proof is essentially mimicking the proof of part(l) 
of Theorem [8] Without loss of generality, let Val(Gi, MDT(A,r))(s) > 
Val(G 2 , MDT(A, r)){s). Let -k x be a pure memoryless optimal strategy in G\ and such 
a strategy exists by TheoremQ] Then we have the following inequality 

Val(G 2 ,MDT(A,r))(s) >Val(G 2 \ ir u MDT(A, r))(s) 

> Val(Gi t TTi, MDT(A,r))(s) - ((1 + dist R {G 1 ,G 2 )) % \ s \ - l) 
= Val(Gi, MDT(A, r))(s) - ((1 + dist R (G u G 2 )) 2 ^ - l) 

The (in)equalities are obtained: the first inequality follows because the value in G 2 
is at least the value in G 2 obtained by fixing a particular strategy (in this case 7Ti); 
the second inequality is obtained by appying Theorem [4] on the structurally equivalent 
Markov chains G\ \ it\ and G 2 \ ~k\; and the final equality follows since m is an 
optimal strategy in G\. The desired result follows. I 

Proof. (ofLemma\5}. The proof is essentially mimicking the proof of Lemma [4] With- 
out loss of generality, let Val(Gi, M DT( A, r))(s) > Val(G 2 , MDT(A, r))(s). Let m 
be a randomized memoryless optimal strategy in G\ and such a strategy exists by The- 
orem Q] Then we have the following inequality 

Va I (G 2 , MDT( A, r))(s) > Val(G 2 \ tti, MDT(A, r))(s) 

> Val(G x r TTi, MDT(A,r))(s) - ((1 + dist R (G u G 2 )) 2 ^ - l) 
= Val(Gi, MDT(A,r))(s) - ((1 + d^ fl (G 1( G 2 )) 2 -l s i - l) 

The argument for the inequalities are exactly the same as in Lemma |4] The desired 
result follows. I 



Fig. 3. Markov chains G\ and G| for Example 1. 



Example 4 (Asymptotically tight bound for small distances). We now show that the our 
quantitative bound for the value function difference is asymptotically optimal for small 
distances. Let us denote the absolute distance as e, and quantitative bound we obtain in 
Theorem|6]is (1 -\ — £— ) 2 'l 5 ' — 1, and if s is small (e << rj and s close to zero), we 
obtain the following approximate bound 

(1 + — ^ — )2-|-s-| _ i w (i + £) 2 -|s| _ i ~ i + 2 . \S\ ■ - - 1 = 2 • \S\ ■ -. 
rj — e rj rj rj 

We now illustrate with an example (on structurally equivalent Markov chains) where 
the difference in the value function is 0(\S\ ■ e), for small e. Consider the Markov chain 
defined on state space S — {sq, Si, . . . , S2n-i, S2n} as follows: states so and s^n are 
absorbing (states with self-loops of probability 1) and for a state 1 < i < 2n — 1 we 
have 

S(si)(si-i) = - + e; 6(si)(s i+ i) = - ~ e; 

i.e., we have a Markov chain defined on a line from to 2n (with and 2n absorbing 
states) and the chain moves towards with probability i + e and towards 2n with 
probability | — e (see Fig |4j. Our goal is to estimate the probability to reach the state 
so, and let Vi denote the probability to reach sq from the starting state s. L . Then we have 
the following simple recurrence for 1 < i < 2n — 1 

A ^ A 

Vi = (~ + £) ■ Vi-i + {^-e)- v i+ i; 

and vq — 1 and «2 n = 0. We will consider s > such that e is very small and hence 
higher order terms (like e 2 ) can be ignored. We claim that the values Vi can be expressed 
as the following recurrence: Vi+i = (h + e) • c, • Vi, where Ci — j^ — . The proof is 
by induction and is shown below: 

Vi = (| + e) ■ Vi-i + (| - e) ■ v i+ i 

= (| + e) ■ Vi-i + (| - e) • (| + s) • a ■ Vi (by inductive hypothesis v. i+1 = (\ + e) ■ a ■ v t ) 



= (i + e) ■ Ui_i + i\-s 2 )- 



v, 



(| + e) • Vj_i + j-Ci-Vi (ignoring e 2 ) 



■so 



■S'l 



+ e 



■SJ 



+ e 




Fig. 4. Markov chains for Example 2. 



It follows that v l = {\+e)- -^- ■ v^ t ={\+s) 



Vi-\. Hence we have 



vi = (i +e) • v + (i -e)-v 2 

= (i+£)-l + (i-£)-(i+£)-Cl-Vl 

= (§ + e) + \ ■ a ■ vi (ignoring e 2 ) 



Thus we obtain that vi 



4-ci 



(2 + £)■ Then we have v 2 — {\ + e) ■ c\ ■ V\ = 
c\ ■ ( i + e) 2 and then 113 = jz~ ■ C\ ■ c 2 ■ (| + e) 3 and so on. Finally we obtain 
Ci • C2 ■ • • Cri-i • (| + e)". Observe that for the Markov 
chain with e = 0, the states so and S2n are the recurrent states, and since the chain 
is symmetric from s n (with e = 0) the probability to reach S2n and so must be equal 



4— ci 

w„ as follows 



and hence is i. It follows that we must have 



c\ ■ c 2 • • -c„_i 



2" 1 , Hence we 



2 . iti^i^v.^ u.^l ">""'»"""'>' 4_ Cl 
have that for e > 0, but very small, v n « | + n • e. Thus the difference with the value 
function when s — as compared to when e > but very small is n ■ e = 0(\S\ ■ e). 
Also observe that the Markov chain obtained for e = and | > e > are structurally 
equivalent. Thus the desired result follows. I 



